A computational offloading optimization scheme based on deep reinforcement learning in perceptual network

Currently, the deep integration of the Internet of Things (IoT) and edge computing has improved the computing capability of the IoT perception layer. Existing offloading techniques for edge computing suffer from the single problem of solidifying offloading policies. Based on this, combined with the characteristics of deep reinforcement learning, this paper investigates a computation offloading optimization scheme for the perception layer. The algorithm can adaptively adjust the computational task offloading policy of IoT terminals according to the network changes in the perception layer. Experiments show that the algorithm effectively improves the operational efficiency of the IoT perceptual layer and reduces the average task delay compared with other offloading algorithms.

Response: Thank you for your comment. This paper mainly focuses on optimizing the problem of local computing or offloading the tasks from the IoT terminal to the edge server for computing. The parameter setting in the experiment is the data size of the unloading task. The set data size is within the acceptable calculation range of the mobile edge server. When the task is offloaded, it can adaptively find a suitable edge server for calculation. It won't affect the average delay too much. In addition, the default size of the data block is generally 64MB or 128MB. The revised text is highlighted in RED.
For your convenience, the revised text is given as follows.
In addition, s t represents the data size of the offloaded task. The set data size is within the acceptable calculation range of the mobile edge server. When the task is offloaded, it can adaptively find a suitable edge server for calculation. It won't affect the average delay too much. Comment # 3: In addition to the average delay, show the impact of the parameters (at least one of them) on the throughput.
Response: Thank you for your comment. This paper mainly models and conducts experiments for minimizing the average delay and maximizing the task completion rate and is not designed for throughput. In the reward setting, if the task is completed before the deadline, that is, if the task completion rate is relatively high, a high reward will be set. Otherwise, a relatively low reward will be set. During the optimization process, the solution will converge in the direction of high reward to ensure a high task completion rate and low average delay. In the future, we will also consider more objectives for experiments to maximize performance.

Response to Reviewer 2nd Comments
This paper investigates a computation offloading optimization scheme for the perception layer. The algorithm can adaptively adjust the computational task offloading policy of IoT terminals according to the network changes in the perception layer. Although the work has potential, several major concerns need to be addressed before the paper can be accepted for publication: Comment # 1: The optimization problem in Section 3.1.3 is not clear. Specifically, it is not clear from Equation (6) what the authors are trying to optimize. How can the task completion rate be optimized by maximizing the local and service execution times?
Response: Thank you for your comment. Equation (6) is used to calculate the task completion time.
Both task delay and task completion rate are calculated based on Equation (6). In the reward setting, if the task is completed before the deadline, that is, if the task completion rate is relatively high, a high reward will be set. Otherwise, a relatively low reward will be set. During the optimization process, the solution will converge in the direction of high reward to ensure the task completion rate. The revised text is highlighted in RED. For your convenience, the revised text is given as follows.
Our optimization goal considers the expected task delay and the task completion rate (TCR). According to the above formula, the task completion time is: Assuming that there are P tasks in total, the expected delay of all tasks (which can also be understood as the average task completion time of P tasks) can be expressed as: Assuming that there are C tasks completed within the deadline, the task completion rate of all tasks can be expressed as: This paper aims to maximize the TCR while making the task waiting time as small as possible. Therefore, when the task is not completed within the deadline, a negative reward is given, and when the task is completed within the deadline, a high positive reward is given. We set the reward as: Where t d i represents the task deadline.
Comment # 2: Moreover, no constraints are defined for the optimization problem, which makes it incomplete.
Response: Thank you for your comment. We added constraints to the problem in model building to make the article more complete. The revised text is highlighted in RED. For your convenience, the revised text is given as follows.
In addition, we need to consider some constraints when calculating. n represents the total number of tasks, and M represents the total number of MECs. i represents the task index, and j represents the MEC index. M i denotes the MEC-available set of task n i , and M ij denotes that task n i is executed on MEC M j . t i represents the task completion time of task n i . B ij represents the task execution time of task n i on MEC M j .
Among them, constraint (6) indicates that one MEC can be selected from the set of optional MECs for each task n i for execution, but each task can only be executed on one MEC. Constraint (7) restricts that the task execution time of each task n i on MEC M j cannot exceed its task completion time. constraint (8) qualifies the non-negativity of all parameters.

Comment # 3: It is not clear what novel contributions the authors bring by employing Deep Rein-
forcement Learning (DRL) for offloading. Is the internal structure of DRL tuned for the considered problem, or is it just used as a blackbox approach?
Response: Thank you very much for your suggestion. This paper builds an optimization model for offloading computing tasks of IoT terminals under the perception layer network and uses deep reinforcement learning to adapt the computing task offloading strategy of IoT terminals. This paper does not adjust the internals of DDPG but uses it as a blackbox approach. But in future research, we will also consider further optimization of DDPG to make the unloading more efficient. Response: Thank you for your comment. We carefully read several of the important references you suggested and added them to the appropriate place in the paper to make it more persuasive. For your convenience, We added literature to the paper as follows.
The Internet of Things (IoT) has significant advantages over traditional communication technologies. However, IoT devices have limited resources [1].
Generally speaking, limited resources are solved by shifting the computational workload to other devices with better resources and offloading computation [2,3].
On the other hand, the extensive distribution of edge computing nodes and the powerful offloading capability [4] greatly facilitate the task computing and data transmission of IoTs terminals. Comment # 6: The environment within which the simulations have been done needs to be further clarified, i.e., programming language, framework, resource specifications, etc.
Response: Thank you for your suggestion. We further clarify the environment in which the simulation is performed, including programming languages, frameworks, resource specifications, etc., in the experimental section to complete the paper. The revised text is highlighted in RED. For your convenience, the revised text is given as follows.
The experiments in this paper are all implemented based on Python 3.8 programming under the Windows 10 system, and the DDPG algorithm is implemented based on the TensorFlow framework programming. The MEC server is equipped with an Intel Core i7-7700 CPU. In the standard configuration, 30 IoT devices are connected in each case.
Thank you once again for your valuable time, insightful suggestion as well as positive comments.
We hope the above-detailed response has clarified the essential issues, and our revised manuscript will meet your approval again.
Looking forward to further advice you may have for us.

Sincerely:
The Authors